Profiles and fuzzy K-nearest neighbor algorithm for protein secondary structure prediction
نویسندگان
چکیده
We introduce a new approach for predicting the secondary structure of proteins using profiles and the Fuzzy K-Nearest Neighbor algorithm. K-Nearest Neighbor methods give relatively better performance than Neural Networks or Hidden Markov models when the query protein has few homologs in the sequence database to build sequence profile. Although the traditional K-Nearest Neighbor algorithms are a good choice for this situation, one of the difficulties in utilizing these techniques is that all the labeled samples are given equal importance while deciding the secondary structure class of the protein residue and once a class has been assigned to a residue, there is no indication of its confidence in a particular class. In this paper, we propose a system based on the Fuzzy K-Nearest Neighbor Algorithm that addresses the above-mentioned issues and the system outperforms earlier K-Nearest neighbor methods that use multiple sequence alignments. We also introduce a new distance measure to calculate the distance between two protein sequences, a new method to assign membership values to the Nearest Neighbors in each of the Helix, Strand and Coil classes. We also propose a novel heuristic based filter to smoothen the prediction. Particularly attractive feature of our filter is that it does not require retraining when new structures are added to the database. We have achieved a sustained three-state overall accuracy of 75.75% with our system. The software is available upon request.
منابع مشابه
Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method
MOTIVATION The solvent accessibility of amino acid residues plays an important role in tertiary structure prediction, especially in the absence of significant sequence similarity of a query protein to those with known structures. The prediction of solvent accessibility is less accurate than secondary structure prediction in spite of improvements in recent researches. The k-nearest neighbor meth...
متن کاملFUZZY K-NEAREST NEIGHBOR METHOD TO CLASSIFY DATA IN A CLOSED AREA
Clustering of objects is an important area of research and application in variety of fields. In this paper we present a good technique for data clustering and application of this Technique for data clustering in a closed area. We compare this method with K-nearest neighbor and K-means.
متن کاملDrought Monitoring and Prediction using K-Nearest Neighbor Algorithm
Drought is a climate phenomenon which might occur in any climate condition and all regions on the earth. Effective drought management depends on the application of appropriate drought indices. Drought indices are variables which are used to detect and characterize drought conditions. In this study, it was tried to predict drought occurrence, based on the standard precipitation index (SPI), usin...
متن کاملProtein secondary structure prediction using distance based classifiers
De novo structure determination of proteins is a significant research issue of bioinformatics. Biochemical procedures for protein structure determination are costly. Use of different pattern classification techniques are proved to ease this task. In this article, the secondary structure prediction task has been mapped into a three-class problem of pattern classification, where the classes are h...
متن کاملA Novel Fuzzy Based Method for Heart Rate Variability Prediction
Abstract In this paper, a novel technique based on fuzzy method is presented for chaotic nonlinear time series prediction. Fuzzy approach with the gradient learning algorithm and methods constitutes the main components of this method. This learning process in this method is similar to conventional gradient descent learning process, except that the input patterns and parameters are stored in mem...
متن کامل